Ian Mc Farlane, Bram Stults
2025-04-16
Stakeholder
Mission
Deployed bioinformatic models to analyze diseases using genomic expression data
Focussed on CVID: a clinically defined disease, but which is molecularly not well-defined
Organized a complete code-base to support immunological research and publishability
Context
This analysis is subsequent to work of Dr. Paul Maglione : ‘Convergence of cytokine dysregulation and antibody deficiency in common variable immunodeficiency with inflammatory complications’ :
We hope our analysis will contribute as support system and a viewpoint on :
Locally
Globally
Expanding understanding of less understood immunological topic, contributing to body of evidence
Supporting the immunological research community and hopefully providing a data point one day improving care
Dr. Johnson provided three separate data sets directly. These were developed or measured by research teams at Boston University and Rutgers:
As with other genomic studies, this analysis was subject to the problem of “small N big P”
We learned and used the Summarized Experiment (S4) data structure. This structure packages count data with patient metadata into a single object.
Patient anonymity was a key priority and maintained throughout.
Maintained separate databases while sharing versions through Git to prevent leaking sensitive data
General data cleaning (renaming variables, filtering outliers, splitting datasets)
Augmented the data by converting gene expression counts into: counts per million (CPM), log-counts, and log-CPM
Moderate EDA performed to identify and label appropriate treatment groups